Knowledge-based Supervision for Domain-adaptive Semantic Role Labeling
نویسنده
چکیده
Semantic role labeling (SRL) is a method for the semantic analysis of texts that adds a level of semantic abstraction on top of syntactic analysis, for instance adding semantic role labels like Agent on top of syntactic functions like Subject . SRL has been shown to benefit various natural language processing applications such as question answering, information extraction, and summarization. Automatic SRL systems are typically based on a predefined model of semantic predicate argument structure incorporated in lexical knowledge bases like PropBank or FrameNet. They are trained using supervised or semi-supervisedmachine learningmethods using training data labeled with predicate (word sense) and role labels. Even state-of-the-art systems based on deep learning still rely on a labeled training set. However, despite the success in an experimental setting, the real-world application of SRL methods is still prohibited by severe coverage problems (lexicon coverage problem) and lack of domain-relevant training data for training supervised systems (domain adaptation problem). These issues apply to English, but are even more severe for other languages, for which only small resources exist. The goal of this thesis is to develop knowledge-based methods to improve lexicon coverage and training data coverage for SRL. We use linked lexical knowledge bases to extend the lexicon coverage and as a basis for automatic training data generation across languages and domains. Links between lexical resources have already been previously used to address this problem, but the linkings have not been explored and applied at a large scale and the resulting generated training data only contained predicate (word sense) labels, but no role labels. To create predicate and role labels, corpus-based methods have been used. These rely on the existence of labeled training data as sources for label transfer to unlabeled corpora. For certain languages, like German or Spanish, several lexical knowledge bases, but only small amounts of labeled training data exist. For such languages, knowledge-based methods promise greater improvements. In our experiments, we target FrameNet, a lexical-semantic resource with a strong focus on semantic abstraction and generalization, but the methods developed in this thesis can be extended to other models of predicate argument structure, like VerbNet and PropBank. This
منابع مشابه
برچسبزنی خودکار نقشهای معنایی در جملات فارسی به کمک درختهای وابستگی
Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...
متن کاملSemantic Role Labeling for Process Recognition Questions
We consider a 4th grade level question answering task. We focus on a subset involving recognizing instances of physical, biological, and other natural processes. Many processes involve similar entities and are hard to distinguish using simple bag-ofwords representations alone. Simple semantic roles such as Input, Result, and Enabler can often capture the most critical bits of information about ...
متن کاملKnowledge-Based Labeling of Semantic Relationships in English
An increasing number of NLP tasks require semantic labels to be assigned, not only to entities that appear in textual elements, but to the relationships between those entities. Interest is growing in shallow semantic role labeling as well as in deep semantic distance metrics grounded in ontologies, as each of these contributes to better understanding and organization of text. In this work I app...
متن کاملGenerating Training Data for Semantic Role Labeling based on Label Transfer from Linked Lexical Resources
We present a new approach for generating role-labeled training data using Linked Lexical Resources, i.e., integrated lexical resources that combine several resources (e.g., WordNet, FrameNet, Wiktionary) by linking them on the sense or on the role level. Unlike resource-based supervision in relation extraction, we focus on complex linguistic annotations, more specifically FrameNet senses and ro...
متن کاملبرچسبزنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه
Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...
متن کامل